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Abstract 

Recombination is introduced into Eigen's theory of quasispecies evolution. Com- 
paring numerical simulations of the rate equations in the non-recombining and re- 
combining cases show that recombination has a strong effect on the error threshold 
and, for a wide range of mutation rates, gives rise to two stable fixed points in the dy- 
namics. This bi-stability results in the existence of two error thresholds. We prove 
that, under some assumptions on the fitness landscape but for general crossover 
probability, a fixed point localized about the sequence with superior fitness is glob- 
ally stable for low mutation rates. 

1 Introduction 

The quasispecies concept was introduced by Eigen in 1971 P to describe populations 
of self-replicating molecules. A quasispecies is an equilibrium distribution of closely 
related gene sequences, localized in sequence space around one or a few sequences of 
high fitness. The quasispecies model can be viewed as a simple framework that contains 
all the basic ingredients of Darwinian evolution. In particular, it captures the critical 
relation between mutation rate and information transmission ^ [2j. The behavior of 
these systems has been extensively studied, see for instance j2 HI El IH El • Quasispecies 
have also been fruitfully studied using concepts and techniques from statistical physics, 
see, e.g., 0(71111. 

In the quasispecies model, the population dynamics is described on the gene level, 
and a fitness landscape ^ is used to define the degree of adaptation directly from the 
gene sequence. Considerable amounts of work has gone into defining models of rugged 
landscapes and analyzing their consequences for the evolutionary dynamics (e.g. jlOLIlH 

dUSllIlI). 

In this paper we introduce recombination into the quasispecies model. With some 
exceptions (see, e.g., |^E1E1E]) previous work on quasispecies has only considered 
non-recombining populations where variation is created only by mutation. However, 
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most species in nature use crossover during replication, at least to some degree, which 
makes this an important case to study. Besides applications to evolutionary biology, 
developing an understanding for the dynamics of systems under recombination is also 
important for gaining theoretical insights into the behavior of genetic algorithms j2j in 
combinatorial optimization problems. 

Recombination introduces a non-linearity in the rate equations, which in general 
results in the appearance of two stable fixed points. For a wide range of mutation 
rates this divides the space of initial distributions into two regions: one where the 
population converges to a distribution localized around the genome with highest fitness, 
and another where it converges to an approximately uniform distribution. This behavior 
is qualitatively different from that of non-recombining populations. Another interesting 
observation is the shift in the error threshold. 

The main contribution of the paper is a proof that, for a class of fitness landscapes 
(see Section 121 for details), independent of the crossover probability, there exist exactly 
one globally stabile fixed point. The single peaked fitness landscape is a special case 
that belongs to this class. 

The rest of this paper is organized as follows: 

Section |21 gives a short review of quasispecies evolving under mutation only, for 
comparison with the recombination case. In section |21 we introduce the rate equations 
for quasispecies with mutation and recombination, and formulate a condition for the 
equilibrium distribution as a generalized non-linear eigenvalue problem. 

Section 0] contains results from numerical simulations of the rate equations for a 
recombining population. We demonstrate how the equilibrium distribution changes with 
mutation rate for different initial distributions. As in the non-recombining phase 
transition from a localized to a uniform distribution occurs when the mutation rate is 
increased. The dependence of the phase transition point on the initial distribution is 
investigated. 

In section El we prove that, under some assumptions on the fitness landscape but 
without constraint on the crossover, when the mutation rate is low enough all initial 
distributions converge to a fixed point localized around the genome with highest fitness. 
Finally, section |B1 contains a discussion and conclusions. 

2 Quasispecies 

In this section we give a short review of relevant results for quasispecies with non- 
recombining replication ^ [2] , to allow us to compare with the results when recombina- 
tion is included. In the model, a self-replicating molecule is represented by a sequence 
of bases s^, (siS2 • • • Sn)- The bases are assumed to be binary {0, 1}, and all sequences 
have equal length n. A genome is then given by a binary string (011001 • ■ ■), which also 
can be represented by an integer fc (0 < fc < 2"). The space of all gene sequences in 
the model is called sequence space A quasi-species is defined as a distribution of 
sequences localized in sequence space. 

Selection in the quasispecies model is expressed in terms of a fitness landscape, which 
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is a function of the phenotype and the environment. The environment describes direct 
interactions with other organisms as well as the physical environment. In the quasispecies 
model we assume that the phenotype is directly determined by the genotype. There is 
no direct interaction between individuals in the population, only indirect competition 
for resources. The fitness landscape can then be expressed as a function of the genotype 
only. In the following, we only consider a simple landscape with a single sequence of 
high fitness Aq, called the master sequence, and with all other sequences i having equal 
fitness Ai < Aq. 

Mutations are described by Q^, the probability that replication of genome / gives 
genome k as off^spring. If the mutation rate per base, pm = ^ — where q is the copying 
accuracy per base, is assumed to be constant in time and independent of position in the 
genome, we obtain 

Qi = pi^'9"-''=^ = ^"(^)'" (1) 

where h^i is the Hamming distance between genomes k and i. 

The rate equations that describe the dynamics of the population are then given by 
(where denotes the relative concentration of species k): 

Xk = Qk^i^i - exk (2) 
where e = J2i ^i^i- The second term ensures the total normalization of the population 

{j:ixi = i). 

These diff'erential equations can be solved analytically [^31 [25. Equation © can be 
made linear through a change of variables and we can then use standard techniques to 
find Xk- If all the elements of the matrix are strictly positive, x^ always converges 
to a unique stable fixed point 123; given by the eigenvector corresponding to the largest 
eigenvalue / = e of the matrix Q^Ai. 

For a landscape where the fitness only depends on the Hamming distance from the 
master sequence, we can divide sequence space into error classes containing sequences 
with the same number of ones. The effective dimension of the system of equations ((21) 
can then be reduced from 2" to n + 1 by summing over error classes. In this way we 
obtain the new equations 

XK = Qk^lxl - EXK (3) 
L 

where the indices K and L denote error classes, and describes mutation probabilities 
between error classes rather than sequences. 

We now consider a fitness landscape with = 10, and Al = 1 for all L ^ 0. 
The sequences are indexed by their Hamming distance from the master sequence. The 
equilibrium distributions corresponding to diff'erent mutation rates, pm, are shown in 
figure n There is a sharp transition between a state where the population is localized 
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around the master sequence xq and a state where the distribution is approximately bi- 
nomial. This is the error catastrophe (or error threshold) of Eigen and coworkers. 

Fig. ^here. 

The error catastrophe occurs approximately when q"Ao/Ai = 1, or when the selective 
advantage of the master sequence, A^/Ai, is compensated by the finite probability < 1 
for the master sequence to replicate to itself. 

This observation is important for theories of prebiotic evolution of life. When polynu- 
cleotides replicate without replicase enzymes, the copying fidelity is unlikely to exceed 
0.99, which means that n cannot be larger than 100 (Ij. This is much smaller than 
coding regions for replicase enzymes, which are needed to increase the copying fidelity. 
This contradiction is often called Eigen's paradox. There have been several different 
attempts to resolve this problem, such as hyper-cycles 0. 

In the following sections we consider quasispecies where both recombination and 
mutation can occur during replication. The introduction of recombination will cause 
major changes in the population dynamics. As an example, we observe that the rate 
equations have multiple stable fixed points. The error threshold also also significantly 
shifted. 



3 Recombination 

The crossover operator, Tjj^, denotes the probability that parents / and m give rise to the 
offspring k in one recombination event |15[ ll7j. The crossover operator Tj,"^ depends on 
the crossover probability pc € [0,0.5], i.e., the probability per base pair for the reading 
process to switch from one parent to the other. As an example, Pc = 0.5 (uniform 
crossover) means that each position in the genome is chosen with equal probability from 
each parent. Another extreme case is pc = which means that the offspring inherits all 
its genome from a single randomly chosen parent. 

The crossover operator has the following properties 



< r/™ < 1 (4) 

Y.Tt^ = l \/l,m (5) 
it 

For uniform crossover we can write T^"* explicitly as 

^ j 2-'^"" if 0{k,l,m) = l 
^ I if 0{k,l,m) = 

where 0{k, I, m) = 1 if at each position where the parents genome / and m are identical, 
the same base also appears in the child genome k, else 0{k,l,m) = 0. New genes can 
only be created by mutations. 
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The most reahstic and interesting population dynamics involves both recombina- 
tion and mutations. In our model we have only recombining individuals and the point 
mutations will come in as limited reading accuracy in the crossover process. We have 
chosen to let the number of offsprings depend on both parents. The rate equations for 
a population of sequences which both recombine and mutate are then given by 

Xk = yk^^lXlAmXm - CXk (7) 

Im 

where = Yl,i Qk'^i^ and c = {J2i ^ixi)'^ (which is used to normalize the total growth 
as before). 

The rate equations in the case of recombination are in general much harder to analyze 
than in the case of pure mutations. The crossover operator acts on pairs of sequences, 
which gives rise to a non-linearity in the growth term. We are mainly interested in 
the equilibrium distribution, i.e., the concentration of sequences after long time. In the 
pure mutation case the stable equilibrium distribution could be calculated by solving a 
standard eigenvalue problem. When recombination is used the fixed points of the rate 
equations Q, y, are solutions to the generalized eigenvalue problem: 

Y^Vl^'AmAmym = XVk yk (8) 

Im 

All normalized yi = 1) solutions to (jHI are also fixed points to the rate equations, 
since summing over k gives the relation A = (X^; Aiyif' = c. There may however exist 
solutions to equation (jH)) which cannot be normalized to a vector of concentrations, since 
all elements must be non-negative. 

In general there exists more than one solution to (jH)) which can be normalized to a 
concentration vector. It turns out that these multiple fixed points can be stable, see 
section [l] One of the most important differences between the non-recombining and the 
recombining case is in fact the uniqueness of the equilibrium distribution. As we will see 
in section \^ the equilibrium distribution of the rate equation ((T)) depends on the initial 
distribution (as was previously observed in other models, e.g. UHl)- This behavior is 
very different from the pure mutation case, where all initial distributions converge to a 
unique stable fixed point, as discussed in section |2j 

However, in Section[21we present a proof that in the zero mutation rate limit, the only 
globally stabile fixedpoint corresponds to a population totally localized on the fitness 
peak. 

The dimension of sequence space scales exponentially with the number of bases in 
the genome. In the non-recombining case we saw that the degrees of freedom in the rate 
equations ((TJ could be reduced from 2" to n + 1 by dividing the sequences into error 
classes. This symmetry is in general broken by recombination (see figure [21). The only 
non trivial case when the rate equation (jT)) preserves the symmetry between the error 
classes, is when pc = 0.5 (uniform crossover). In this case we can write the reduced rate 
equations as 



5 



XK = Vk^ALXLAMXM - CXK (9) 

L, M 

where we use the same notation as in equation For pc = 0.5 and pm = the 

transition probabihties between error-classes V^^^ are given by 

^M+L+2min(n-L-M,0) ( ^ \ 

yLM ^ l^.nin(i.,„)-2sazH j ^^^^ 

(m) 

In the more reahstic case when pc < 0.5, we either have to be satisfied with rather 
smah genome sizes or need to use some approximation method. 

Fig. Inhere. 



4 Numerical Results 

Fig. I^lhere. 

In this section we present results from computer simulations of the rate equations 
0. We concentrate on the asymptotic behavior as time goes to infinity, and do not 
consider detailed dynamics of the transients. Equilibrium distributions are obtained by 
a straight-forward simulation of the differential equations. All the simulations in this 
section use uniform crossover (pc = 0.5), which preserves the error class symmetry. 

We now consider a fitness landscape with an isolated peak (Aq = 10, and = 1 
VL 7^ 0). The equilibrium distributions for recombining and non-recombining popula- 
tions are presented in figure |31 where the initial distribution is binomial over the error 
classes. The phase transition between the localized and non-localized state is extremely 
sharp in the recombination case. The phase transition occurs at a mutation rate which 
is orders of magnitude lower than in the non-recombining population. 

Figure |1] shows the equilibrium distribution of recombination dynamics with the 
same fitness landscape as figure El the only difference is the initial distribution which 
is completely localized to the master sequence {xq = 1, and xk = VK / 0). We see 
that the equilibrium distributions depend strongly on the initial distributions. The error 
threshold is still lower than in the pure mutation case, however the difference is much 
smaller. In general recombination in single peak fitness landscapes tends to mix the gene 
sequences and push the population above the error threshold. 

Fig. 0]here. 
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Figure ini and [7| show how the equihbrium distributions and the phase transition point 
varies with the initial distribution. The initial distributions are given by 



This gives a uniform distribution for s = and a distribution concentrated to the master- 
sequence for large s. The graphs in figure[51show the initial distributions for some discrete 
parameter values, s = 0, 1, • ■ ■ , 5. Figure El shows that there are two different regions in 
the space of initial distributions, converging to two different fixed points. In one corner 
of this space all the genomes are master-sequences. If the concentration vector starts 
out far from this corner it will not converge into the corner unless the mutation rate 
is extremely low (as illustrated in figure [31 or by the case of s E [0,1] in figure I7|). If 
the initial distribution starts near the corner it will converge into the corner for much 
larger mutation rates (see figure |1] or the region s G [3,5] in figure [Tj). Figure [7| shows 
the location of the phase transition point for different initial distributions defined by 
equation 1111 This phase diagram shows how the border between the two regions in 
figure El changes with mutation rate. A change of pm from 9 • 10~^ to 0.055, changes the 
border from s = 1 to 3. When the mutation rate is too low or too high only one region 
exists corresponding to a single stable fixed-point. 

That there is an upper bound on the mutation rate where a stable localized fixed 
point ceases to exist is obvious. The existence of a lower bound, below which all ini- 
tial distributions converge to a localized distribution, is however non-trivial. This lower 
bound always exists and we will present a proof of this in section [51 

Fig. Ohere. 
Fig. El here. 
Fig. [71 here. 

The main conclusion to be drawn from these numerical simulations is that, for a wide 
range of mutation rates, one finds a coexistence of two different equilibrium distributions 
to the rate equations involving both recombination and point mutations. Which of these 
fixed points the population will converge to depends on the initial distribution. This 
means that the space of initial distributions consists of two regions, with the border 
between this regions depending on the mutation rate. The whole range of mutation 
rates where a localized fixed point exists is however lower than the phase transition 
point in the non-recombining case. This shows that a recombining population is more 
sensitive to mutation than a non-recombining one on a single peak landscape. Similar 
conclusions have been reached in a simpler model by Bergman and Feldman ^Sl . Similar 
results have also been shown in other work, see e.g., |15j . 





7 



5 Existence of a single fixed-point at zero mutation rate. 



In this section we investigate the behavior of the rate equations when —^0^. In sec- 
tion |1] it was shown numerically that at very low mutation rates, ah initial distributions 
converge to a highly localized equihbrium distribution. Here we show that this region 
always exists for fitness landscapes fulfilling certain assumptions, to be specified below. 

The idea behind the proof is to study the dynamics of one position or loci in the 
genome and sum over all possibilities at the other positions. Let ^o^'"'*^ denote all 
genomes of length N that contain the sequence a starting at position 1 < i < — n. 



where a is an index coding for genomes of length n. For example; 5; 



(10,2,1) 



will be all 



genomes of length 10 that starts with (11). We also introduce the notation x)^"' , where 
simply indicates the genome length and affects decoding of the index k. We can now 
write the rate equations Q as 



X 



(N) 



Lm \ I / 



(12) 



The crossover operator has the following property 



= r,^^for/G4^'"''\m 



where no assumptions on the crossover probability in made. 

.(TV)/ 



(13) 



Since the point mutation operator Q, has the same property, so will the combined 



operator V, 



(N)lm 



We can now use this property and sum the rate equations over all 



sequences m ba 



E 



fees, 



(N,l,i) 



E iEv;^"^"Axr^™,xi-) 



k&S, 



(JV.l.i) \l,m 



(14) 



EM 



E M""^ E A. 



Pa 



i&s 



(N,l,i) 



(TV) 



mX^ 



( 



E E M 

/3 ;gs(iV,l..) 



(TV) 



.(1) 



(JV,l,i) 



(15) 



(16) 



The following, compact, notation is now introduced: 
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E = {"i (17) 

^ ' \ if /? = 1 ^ ^ 



Eq. El simplifies to 



4^) = ,(A«)\A«A« + (l-,)(A«)^-(A? + Af))^« (18) 
x'i^ = l-x{,^^ (19) 
which, in the hmit g ^ 1~, simphfy to 



(A« + A«) (A«xS^)-A«4^: 



= 1-xJ^^ (20) 
To continue the fohowing assumption on the fitness landscape is needed: 



Ai < Am ifleS^'^'\meSQ'^'' (21) 
We further assume that there exist a gene sequence M G Sq'^'^ such that 



Ai < Am yi G 5f (22) 

These two assumptions mean that no sequences with a zero at position i have a fitness 
inferior to any sequence with a one at this position, and that there exist at least one 
sequence with with a zero at position i with strictly larger fitness than the sequences 
with a one at this position. Under these assumptions, the following inequalities hold 



< (23) 

where Aq^^^,^ (A^^^^^) denotes the minimum (maximum) fitness of the sequences with 
a (1) at position i. We further note that at least one of the inequalities in Eg. 1231 is 
strict unless x^'* = for all M fulfilling Eq. I22L Eq. [23 implies the following estimate 

^0 "* — (Ao,Lm ~ '^l^max) ^1 ^4 ^ (^4) 
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with equality if and only if x\.j = for all M fulfilling Eq. [22lor = or Xg = 0. 
Note however that Xq^"* = 0, x^^^ = 1 is an (unstable) fixed-point since no mutations 
implies no inventions of new genes. 

Prom Eq. |^ it is clear that the rate equations will converge to a state where all 
sequences has a zero at position i. This fixed point is unstable and it is clear that they 
cease to exist when the mutation rate is non-zero. 

We conclude that all sequences with a one at position i will diminish after long time, 
and can therefore be be discarded. We can then search for a new position such that the 
remaining half of the fitness landscape satisfies the assumptions in Ea. l21l and l22l If this 
can be repeated (possibly interchanging the zero and one as being superior, since this 
choice is arbitrary) until the last position, we conclude that the rate equations converge 
to a state completely dominated by genomes with the same sequence (which necessarily is 
a global optimum). Loosely, we may describe such fitness landscapes as having a natural 
ordering of the importance of its loci. One example of a fitness landscape fulfilling these 
requirements is a single peaked fitness landscape, describing a degenerate case where the 
positions can be chosen arbitrarily. 

6 Conclusions and discussion 

We have studied Eigen's quasispecies model extended to include crossover as well as mu- 
tations. The numerical simulations of section 0] show that there are significant changes 
in the dynamics of the rate equations because of the non-linearity arising from the intro- 
duction of crossover. For a wide range of mutation rates, two simultaneous stable fixed 
points exist. One fixed point is concentrated around the master sequence while the other 
describes a uniform distribution. For extremely low and rather high mutation frequen- 
cies there is only a single fixed point, corresponding to the localized distribution and 
the uniform one, respectively. The mutation frequency at the point where the localized 
fixed point ceases to exist is still lower than the error threshold without recombination. 

In this paper we prove that, for a class of fitness landscapes having a hierarchical 
ordering of the loci in the genome (see Section |3 for details), a single globally stabile fixed 
point exist in the limit of zero mutation rate. Since the proof is valid for all crossover 
probabilities, the only natural generalization is to expand the class of fitness landscapes. 
A possible generalization of the technique in Section [S] could be to prove that; within 
larger class of fitness landscapes, for any point in time, i.e., for any distribution x'^^\ 
we can always find a position i such that Eq. |^ is fulfilled. The position i would now 
depend on the distribution (which changes in time), not only the fitness landscape which 
is the case in our proof. Technically however, this generalization is non-trivial since the 
changing of position with the distribution makes it complicated to argue that all locus 
in the global fixedpoint will dominate completely in the infinite time limit. 
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Figure 1: The relative equilibrium concentrations of the 51 different error classes for 
sequences of length 50 for different mutation rates. The fitness landscape has a single 
peak Aq = 10, and Al = 1 0. The error catastrophe occurs around Pm ~ 0.045. 
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Figure 2: The equilibrium distribution for the concentration of genomes at different 
mutation rates. The genomes have length 4 and the crossover probability Pc is 0.1. 
There is a small difference in concentration between genomes in the same error class. 
Genomes 1 and 4 have the same concentration due to the mirror symmetry in the binary 
strings. The symmetry breaking tends to increase with genome length. 
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Figure 3: The equilibrium distributions for recombination (upper graph) and pure muta- 
tion (lower graph) dynamics, when the initial distribution is binomial between the error 
classes. The gene sequences has length 25 and the fitness landscape has an isolated peak 
{Aq = 10, and = 1 VL 7^ 0). 
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Figure 4: The equilibrium distribution for a recombining population when the initial 
distribution is concentrated to the master sequence, xq = 1, and xk = VET 0. The 
gene sequences have length 25 and the fitness landscape has an isolated peak (^o = 10) 
and = 1 VL 7^ 0). 




Figure 5: Initial distributions for different values of the parameter s. 
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Figure 6: Equilibrium distributions for different values of the parameter s. The copy- 
ing fidelity is constant q = 0.97. Note that there are only two different equilibrium 
distributions. 




Figure 7: The copying fidelity at the phase-transition for different initial distributions 
Xfc(s) (as defined in equation [TT|) . The gene sequence has length 25 and the fitness 
landscape has an isolated peak {Aq = 10, and Al = 1 ML ^ 0). 
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